Author - Anand Bhausaheb Kharabe
Task 3: Exploratory Data Analysis - Retail
Level : Beginner
Language-Python
Software-Jupyter Notebook
Aim: To perform βExploratory Data Analysisβ on dataset βSampleSuperstoreβ
As a business manager, try to find out the weak areas where you can work to make more profit.
DataSet can be downloaded from this link :- https://bit.ly/3i4rbWl
import numpy as np
import pandas as pd
import sklearn.metrics as sm
import seaborn as sns
import matplotlib.pyplot as plt
import klib
%matplotlib inline
import plotly.graph_objects as go
import plotly.express as px
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected = True)
import warnings
warnings.filterwarnings("ignore")
t3 = pd.read_csv("SampleSuperstore.csv")
t3.head() # Shows the first five rows of the data from variable t3
t3.tail() # Shows the last five rows of the data from variable t3
t3.shape # shows the shape of the data variable in tuple format
t3.info() # Print the summary of the dataframe
t3.describe()
# shows the Statistical details
t3.columns # Displays the column names of the data
t3['Ship Mode'].unique() # Gives the unique values into the column Ship Mode
t3['Segment'].unique() # Gives the unique values into the column Segment
t3['City'].unique() # gives the unique values into the column City
t3['State'].unique() # gives the unique values into the column State
t3['Region'].unique() # gives the unique values into the column Region
t3['Category'].unique() # gives the unique values into the column Category
t3['Sub-Category'].unique() # gives the unique values into the column Sub-Category
t3['Sales'].unique() # gives the unique values into the column Sales
t3['Quantity'].unique() # gives the unique values into the column Quantity
t3['Discount'].unique() # gives the unique values into the column Discount
t3['Profit'].unique() # gives the unique values into the column Profit
t3.isna().sum() # Shows the sum of NA values in respective colummns
t3.isnull()
t3.isna().any()
t3.corr() # Show the correlation of the columns with each other
sns.heatmap(t3.corr(),annot=True)
sns.pairplot(t3,hue='Region')
# creating histograms to visualize all the data
fig = plt.figure(figsize = (40,40))
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
ax = fig.gca()
t3.hist(ax = ax)
plt.figure(figsize=(15,15))
sns.countplot(x=t3['State'])
plt.xticks(rotation=90)
plt.title("State")
plt.show()
plt.figure(2, figsize=(20,15))
sns.barplot(x=t3['Category'],
y=t3['Profit'].values,
data = t3)
plt.xticks(rotation= 70)
plt.title('Category/Profit')
plt.xlabel('Category')
plt.ylabel('Profit')
plt.show()
plt.figure(2, figsize=(20,15))
sns.barplot(x=t3['Sub-Category'],
y=t3['Profit'].values,
data = t3)
plt.xticks(rotation= 70)
plt.title('Sub-Category/Profit')
plt.xlabel('Sub-Category')
plt.ylabel('Profit')
plt.show()
plt.figure(2, figsize=(20,15))
sns.barplot(x=t3['Sub-Category'],
y=t3['Discount'].values,
data = t3)
plt.xticks(rotation= 70)
plt.title('Sub-Category/Discount')
plt.xlabel('Sub-Category')
plt.ylabel('Discount')
plt.show()
plt.figure(2, figsize=(20,15))
sns.barplot(x=t3['Sub-Category'],
y=t3['Sales'].values,
data = t3)
plt.xticks(rotation= 70)
plt.title('Sub-Category/Sales')
plt.xlabel('Sub-Category')
plt.ylabel('Sales')
plt.show()
fig = go.Figure(
data=[go.Bar(x= t3['Sub-Category'],y= t3['Profit'])],
layout_title_text= 'Sub-Category Wise Profit'
)
fig.show()
fig = go.Figure(
data=[go.Bar(x= t3['Category'],y= t3['Profit'])],
layout_title_text= 'Category Wise Profit'
)
fig.show()
fig = px.bar(t3, x="Sub-Category", y="Profit", color="Segment", title="Segment Wise Sub-Category/Profit")
fig.show()
Copiers has non-negative profit
Phone has more profit magin base on segment
Fasteners has lowest margin base on segment
We should give more attention on Supplies,Tables
fig = px.bar(t3, x="Sub-Category", y="Profit", color="Region", title="Region wise Sub-Category/Profit")
fig.show()
In Copiers region Wise east give more profit than central, west and south
In phone sub-category east region is in more profit than central, west and south
fig = px.bar(t3, x="Sub-Category", y="Discount", color="Region", title="Region wise Sub-Category/Discount")
fig.show()
A =t3['Region'].value_counts()
A
B = ['West','East','Central','South']
trace = go.Pie(labels = B , values = A,)
data = [trace]
fig = go.Figure(data = data)
iplot(fig)
A1 =t3['Segment'].value_counts()
A1
B1 = ['Consumer','Corporate','Home Office']
trace = go.Pie(labels = B1 , values = A1,)
data = [trace]
fig = go.Figure(data = data)
iplot(fig)
#Boxpot
fig = px.box(t3, y="Profit",color="Region",title="Boxplot of Profit",template="none")
fig.show()
South has the Maximun positive profit 3,177.475 and Negavtive profit margin -3,839.99
West has the Maximun positive profit 6,719.981 and Negavtive profit margin -3,399.98
Central has the Maximun positive profit 8,399.976 and Negavtive profit margin -3,701.893
East has the Maximun positive profit 5,039.986 and Negavtive profit margin -6,599.978
# Scatter Plot
px.scatter(t3,x="Sales", y="Profit", color="Region",
title="Sales vs Profit")
# Scatter Plot
px.scatter(t3,x="Discount", y="Profit", color="Region",
title="Discount vs Profit")
fig = px.bar(t3, x="Region", y="Profit", color="Segment", title="Segment vise Region/Profit")
fig.show()
fig = px.bar(t3, x="Region", y="Sales", color="Segment", title="Segment vise Region/Profit")
fig.show()
Tables should reduce there discount rate because graph show there profit is in loss this solution goes with Bookcases and Supplies
As a Business manager, He/she should gives more attention on fasteners Sub-Category in south region try to give some discount for increase profit
In South region sales is less than other region as a Business Manager He/She should give attention on Sales in south region